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Abstract 

In nonlinear state-space models, sequential learning about the hidden state can proceed 
by particle filtering when the density of the observation conditional on the state is avail- 
able analytically (e.g. Gordon et al. 1993). This condition need not hold in complex 
environments, such as the incomplete-information equilibrium models considered in fi- 
nancial economics. In this paper, we make two contributions to the learning literature. 
First, we introduce a new filtering method, the state-observation sampling (SOS) filter, 
for general state-space models with intractable observation densities. Second, we de- 
velop an indirect inference-based estimator for a large class of incomplete-information 
economies. We demonstrate the good performance of these techniques on an asset 
pricing model with investor learning applied to over 80 years of daily equity returns. 

Keywords: Hidden Markov model, particle filter, state-observation sampling, learn- 
ing, indirect inference, forecasting, state space model, value at risk. 
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1 Introduction 



Sequential learning by economic agents is a powerful mechanism that theoretically explains 



to forecast and price assets, a crucial question arises: How can we track agent beliefs? A 
natural possibility is to consider particle filters, a large class of sequential Monte Carlo 
methods designed to track a hidden Markov state from a stream of partially revealing 
observations (e.g. Gordon, Salmond, and Smith, 1993; Johannes and Poison, 2009; Pitt and 
Shephard, 1999). Existing filtering methods, however, are based on the assumption that 
the density of the observation conditional on the hidden state (called observation density) 
is available in closed form up to a normalizing constant. This assumption is unfortunately 
not satisfied in incomplete-information economies. In this paper, we introduce the state- 
observation sampling (SOS) filter, a novel sequential Monte Carlo method for general state 
space models with intractable observation densities. In addition, we develop an indirect 
inference-based estimator (Gourieroux, Monfort and Renault 1993; Smith, 1993) for the 
structural parameters of an incomplete-information economy. 

Since their introduction by Gordon, Salmond, and Smith (1993), particle filters have 
considerably expanded the range of applications of hidden Markov models and now pervade 
fields as diverse as engineering, genetics, statistics (Andrieu and Doucet, 2002; Chopin, 
2004; Kuensch, 2005), finance (e.g. Kim, Shephard and Chib, 1998; Johannes, Poison, 
and Stroud, 2009), and macroeconomics (Fernandez- Villaverde and Rubio- Ramirez, 2007; 



provide estimates of the distribution of a hidden Markov state St conditional on a time 

In financial economics, investor learning has been used to explain phenomena as diverse as the level 
and volatility of equity prices, return predictability, portfolio choice, mutual fund flows, firm profitability 
following initial public offerings, and the performance of venture capital investments. In particular, the 
portfolio and pricing implications of learning are investigated in Brennan (1998), Brennan and Xia (2001), 
Calvet and Fisher (2007), David (1997), Guidolin and Timmermann (2003), Hansen (2007), Pastor and 
Veronesi (2009b), Timmermann (1993, 1996), and Veronesi (1999, 2000). We refer the reader to Pastor and 
Veronesi (2009a) for a recent survey of learning in finance. 

2 Advances in particle filtering methodology include Andrieu, Doucet, and Holenstein (2010), Del Moral 
(2004), Fearnhead and Clifford (2003), Gilks and Berzuini (2001), Godsill, Doucet, and West (2004), and 
Storvik (2002). Particle filters have received numerous applications in finance, such as model diagnostics 
(Chib, Nardari, and Shephard, 2002), simulated likelihood estimation (Pitt, 2005), volatility forecasting 
(Calvet, Fisher, and Thompson, 2006), and derivatives pricing (Christoffersen, Jacobs, and Mimouni 2007). 
See Cappe, Moulines and Ryden (2005), Doucet and Johansen (2008), and Johannes and Poison (2009) for 
recent reviews. 



key properties of asset returns, aggregate performance and other equilibrium outcomes 
(e.g., Pastor and Veronesi, 2009a) \j In order to use these models in practice, for instance 



Fernandez- Villaverde et al., 2009; Hansen, Poison 




These methods 
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series of observations R t = (7*1, r t ), r t G R nR , by way of a set of "particles" (s t ,...,s\ '). 
In the original sampling and importance resampling algorithm of Gordon, Salmond, and 
Smith (1993), the construction of the date-i filter from the date-(t — 1) particles proceeds 
in two steps. In the mutation phase, a new set of particles is obtained by drawing a 
hidden state from each date-(t — 1) particle sj-i under the transition kernel of the 
Markov state. Given a new observation r t , the particles are then resampled using weights 
that are proportional to the observation density fn(rt\s[ n \ Rt-i)- Important refinements 
of the algorithm include sampling from an auxiliary model in the mutation phase (Pitt 
and Shephard, 1999), or implementing variance-reduction techniques such as stratified 
(Kitagawa 1996) and residual (Liu and Chen 1998) resampling. 

A common feature of existing filters is the requirement that the observation density 
fR{i"t\st, Rt-i) be available analytically up to a normalizing constant. This condition need 
not hold in economic models in which equilibrium conditions can create complex nonlinear 
relationships between observations and the underlying state of the economy. In the special 
case when the state St evolves in a Euclidean space M. ns and has a continuous distribution, a 
possible solution is to estimate each observation density fR(rt\s[ n \ Rt-i), n £ {1, . . . , N}, 
by nonparametric methods (Rossi and Vila, 2006, 2009). This approach is numerically 
challenging because N conditional densities, and therefore 2N 2 kernels, must be evaluated 
every period. Furthermore, the rate of convergence decreases both with the dimension of 
the state space, ns, and the dimension of the observation space, tir, which indicates that 
the algorithm is prone to the curse of dimensionality. 

The present paper develops a novel particle filter for general state space models that 
does not require the calculation of the observation density. This new method, which we call 
the State-Observation Sampling (SOS) filter, consists of simulating a state and a pseudo- 
observation (s[ n \ r^) from each date-(t — 1) particle. In the resampling stage, we assign to 

!n) 

each particle s t an importance weight determined by the proximity between the pseudo- 

(n) 

observation f t and the actual observation rt- We quantify proximity by a kernel of the 
type considered in nonparametric statistics: 



where ht is a bandwidth, and K is a probability density function. The resampling stage 
tends to select states associated with pseudo-observations in the neighborhood of the actual 
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data. SOS requires the calculation of only iV kernels each period and makes no assump- 
tions on the characteristics of the state space, which may or may not be Euclidean. We 
demonstrate that as the number of particles N goes to infinity, the filter converges to the 
target distribution under a wide range of conditions on the bandwidth hf. The root mean 
squared error of moments computed using the filter decays at the rate N~ 2 ^ nR+4: \ that is 
at the same rate as the kernel density estimator of a random vector on M nR . The asymp- 
totic rate of convergence is thus invariant to the size of the state space, indicating that 
SOS overcomes a form of the curse of dimensionality. We also prove that the SOS filter 
provides consistent estimates of the likelihood function. 

We next develop inference methods for incomplete-information equilibrium models. To 
clarify the exposition, we focus on a class of recursive incomplete-information economies 
parameterized by 9 G O, which nests the examples of Brandt, Zeng, and Zhang (2004), Cal- 
vet and Fisher (2007), David and Veronesi (2006), Lettau, Ludvigson and Wachter (2008), 
Moore and Schaller (1996) and van Nieuwerburgh and Veldkamp (2006). We consider 
three levels of information, which correspond to nature, an agent and the econometrician. 
Figure [T] illustrates the information structure. At the beginning of every period t, nature 
selects a Markov state of nature Mt and a vector of fundamentals or signals xt, whose 
distribution is contingent on the state of nature. The agent observes the signal xt, and 
computes the conditional probability distribution ("belief") II t = Ht(xt, Ht-i), for instance 
by using Bayes' rule. According to her beliefs and signal, the agent also computes a data 
point r t = ft(#t, lit, rit_i; 0), which may for example include asset returns, prices, or pro- 
duction decisions. The econometrician observes the data point rj and aims to track the 
hidden state St = (Mt,Tlt) of the learning economy. 

We can apply the SOS filter to estimate the distribution of the state of the learn- 
ing economy conditional on the observed data and the structural parameter 9. We pro- 
pose an estimation procedure for 9 based on indirect inference, a method introduced by 
Gourieroux, Monfort and Renault (1993) and Smith (1993) that imputes the structural 
parameters of a model via an auxiliary estimator (e.g. Calzolari, Fiorentini and Sentana 
2004; Czellar, Karolyi and Ronchetti 2007; Czellar and Ronchetti 2010; Dridi, Guay and 
Renault 2007; Genton and Ronchetti 2003; Heggland and Frigessi 2004). In our context, 
the full-information version of the economy, in which the state of nature Mt is directly 
observed by the agent, is a natural building block of the auxiliary estimator. When the 
state of nature takes finitely many values, the Bayesian filter and the likelihood of the 
full-information model are available analytically (e.g. Hamilton, 1989). Similarly, when 
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NATURE: 

sets the state M t 
I 

signals xt 



AGENT: 
infers belief IT about M t 
I 

data r t 

A 

ECONOMETRICIAN: 

observes r t , infers (M t ,H t ) 



Figure 1: Information structure. 

the state of nature Mt has an infinite support, a full-information economy with discretized 
Mt can be used. Given these properties, we define the auxiliary estimator by expanding 
the full-information economy's maximum likelihood estimator with a set of statistics that 
the incomplete-information model is designed to capture. 

We demonstrate the good performance of our techniques on a structural model of daily 
equity returns. Because the rich dynamics of the return series requires a large state space, 
we base our analysis on the multifrequency learning economy of Calvet and Fisher ("CF" 
2007). We verify by Monte Carlo simulation that the SOS filter accurately tracks the 
state of the learning economy and provides remarkably precise estimates of the likelihood 
function. The indirect inference estimator is also shown to perform well in finite samples. 
We estimate the structural model on the daily excess returns of the CRSP U.S. value- 
weighted index between 1926 and 1999. For the out-of-sample period (2000-2009), the 
incomplete-information model provides accurate value-at-risk forecasts, which significantly 
outperform the predictions obtained from historical simulations, GARCH(1,1), and the 
full-information (FI) model. 

The paper is organized as follows. Section 2 defines the SOS filter for general state 
space models. In section 3, we develop an indirect inference estimator for recursive learning 
economies. Section 4 applies these methods to a multifrequency investor learning model; we 
verify the accuracy of our inference methodology by Monte Carlo simulations, and conduct 
inference on the daily returns of a U.S. aggregate equity index between 1926 and 2009. 
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Section 5 concludes. 

2 The State-Observation Sampling (SOS) Filter 
2.1 Definition 

We consider a discrete-time stochastic system defined on the probability space (f^jF, P). 
Time is discrete and indexed by t = 0, 1, oo. We consider a Markov process St defined 
on a measurable space (S,9~,s). For expositional simplicity, we assume in this subsection 
that § = R n s. 

The econometrician receives every period an observation r t £ M. nR . Let Rt-i = (n, r t -i) 
denote the vector of observations up to date t — 1. The building block of our model is the 
conditional density of (st,r t ) given (st-i, Rt-i)' 



Let fs denote a prior over the state space. The inference problem consists of estimating 
the density of the latent state St conditional on the set of current and past observations: 



at all t > 1. 

A large literature proposes estimation by way of a particle filter, that is a finite set of 
points (s^\ ...,s[ N ^) that targets fs(st\Rt)- The sampling importance resampling method 
of Gordon, Salmond, and Smith (1993) is based on Bayes'rule: 



The recursive construction begins by drawing N independent states Sq , Sq from fs Q . 
Given the date— (t — 1) filter (s|L\, • • ■ , St-i)i ^ ne construction of the date— t filter pro- 
ceeds in two steps. First, we sample from s^} 1 using the transition kernel of the 
Markov process. Second, in the resampling step, we sample N particles (s^, . . . , s[ N ^) 



fs,R{ s t, r t\st-i,Rt-i)- 



(2.1) 



fs(s t \Rt) 



fs(st\Rt) 



fR{r t \s t , Rt-i) fs(st\Rt-i) 
f R {rt\Rt-i) 
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from (s[ n \ . . . , $t) with normalized importance weights 

(n) _ Mn||v^-i) f0 ^ 

Pt ~ . , ,~(n') p s" ^ 

Under a wide range of conditions, the sample mean N~ l X^Li ^K 5 !"^) converges to E[<I>(s t )|i? t 



npl 



for any bounded measurable function <J>| 

The sampling and importance resampling algorithm, and its various refinements, as- 
sume that the observation density fn(rt \st, Rt-i) is readily available up to a normalizing 
constant. This is a restrictive assumption in many applications, such as the incomplete- 
information economies considered in later sections. 

We propose a solution to this difficulty when it is possible to simulate from (|2.ip . Our 
filter makes no assumption on the tractability of fs,Fi('\st-i, Rt-i), and in fact does not 
even require that the transitional kernel of the Markov state St be available explicitly. The 
principle of our new filter is to simulate from each a state-observation pair (s+ , ), 

(n) (n) 

and then select particles §1 associated with pseudo-observations r\ that are close to 
the actual data point r^. The definition of the importance weights is based on Bayes' rule 
applied to the joint distribution of fj , s\ , s|™\ conditional on R t : 

r/ ~( n )\ t ~Mi (n) r, \ r / (n) i D \ 

~(n) _(n) W, D <H r t -r} ')fsM s t , r t I s t-i> R t-i)fs{St-i\ R t-V /OQ x 

rt ' Si ' s *- lli?< M^o ' (2 - 3) 

where 5 denotes the Dirac distribution on R nii . Since the Dirac distribution produces 
degenerate weights, we consider a kernel K with the following properties. 

Assumption 1 (Kernel). The function K : W lR — > M++ satisfies: 

(i) f K(u)du = I; 

(ii) J uK{u)du = 0; 

(Hi) A{K) = J \\u\\ 2 K{u)du < oo; 
(iv) B(K) = f [K{u)] 2 du < oo. 



3 See Crisan and Doucet (2002) for an excellent survey on the convergence of particle filters. 
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For any r E R nR , let 

denote the corresponding kernel with bandwidth ht at date t. The kernel K^ t converges to 
the Dirac distribution as ht goes to zero, which we use to approximate (|2.3p . This suggests 
the following algorithm. 

SOS filter - 



Step 1 (State-observation sampling): For every n = 1,...,N, we simulate a 
state-observation pair {s^ ,f\ ) from fs,R.{'\st-i-> Rt-i)- 

Step 2 (Importance weights): We observe the new data point r t and compute 
Pr = -T T S W> n = l,...,iV. 



Step 3 (Multinomial resampling): For every n = 1, . . . , N, we draw s| n ' 1 from 
, . . . , s[ with importance weights p£ , . . . ,Pf ■ 



The state-observation pairs {(s\ n , f\ ; )} n =i, ...,N constructed in step 1 provide a discrete 
approximation to the conditional distribution of (st,r t ) given the data Rt-i- In step 2, we 
construct a measure of the proximity between the pseudo and the actual data points, and 
in Step 3 we select particles for which this measure is large. The variance of multinomial 
resampling in step 3 can be reduced and computational speed can be improved by alter- 
natives such as residual (Liu and Chen, 1998) or stratified (Kitagawa, 1996) resampling. 
In section 4, we obtain good results with a combined residual-stratified approacho The 
convergence proof below applies equally well to these alternatives. 

4 We select X^=i L-^Pt™'] particles deterministically by setting [Np^ 1 ' J particles equal to Sj™' 1 for every 
n 6 {1, . . . , N}, where |_-J denotes the floor of a real number. The remaining N r> t = N—^^ =1 [Np[ J parti- 
cles are selected by the stratified sampling that produces s' n ' with probability q[ n ^ = (Np\ n ^ — [Np^ J ) /N Tl t , 
n = 1, . . . , N. That is, for every k £ {1, . . . , N r , t }, we draw Uk from the uniform distribution on (yf^, 

and select the particle s[ n ' > such that Uk G (£?=i Qt* \ 5Zj=i Qt^]- 
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2.2 Extension and Convergence 

The SOS filter easily extends to the case of a general measurable state space §. The building 
blocks of the model are the conditional probability measure of (st,rt) given (sj_i, Rt-i)- 

g(-\s t -i,R t -i), 

and a prior measure Ao over the state space. The SOS filter targets the probability measure 
of the latent state St conditional on the set of current and past observations, A(-|i?t). 
The SOS filter is defined as in Section [21 where in step 1 we sample (s[ n \ f^) from the 
conditional measure Rt-l)- 

We now specify conditions under which for an arbitrary state space S and a fixed history 
Rt = ( r i, • • • , r r), T < oo, the SOS filter converges in mean squared error to the target 
X(-\Rt) as the number of particles N goes to infinity. 

Assumption 2 (Conditional Distributions). The observation process satisfies the fol- 
lowing hypotheses: 

(i) the conditional density fn(rt\st-i, Rt-i) exists and 

Kt = sup{f R (f t \st-i,Rt-i);(st-i,r t ) eSx W nR } < oo; 



(ii) the observation density fn(r~t\st, Rt-i) is well-defined and there exists n' t G K+ such 
that: 

\f R (f t \s t ,Rt-i) - fn{r t \s t , R t -i) - - s - r (r t |s t ,i^_ 1 )(f| - r t )\ < K' t \\f t - r t \\ 2 

dr t 

for all (s t , f t ) € S x R nR and t < T. 



Assumption 3 (Bandwidth). The bandwidth is a function of N, ht = ht(N), and 
satisfies 

(i) limAr^oo ht(N) = 0, 

(ii) lim^ooJV[/»t(JV)] nji = +oo, 



S 



for all t = 1, . . . , T. 



We establish the following result in the appendix. 



Theorem 4 (Convergence of the SOS Filter). Under assumptions [7] and[H and for 

every t and N > 1, there exists Ut(N) € K+ such that 



E 



1 N 

-J2K ht (r t -rl n) )-fR(r t \Rt-i) 



n=l 



< [fR{rA f- l)? U t {N), (2-4) 



where the expectation is over all the realizations of the random particle method. Further- 
more, for any bounded measurable function, $ : § — > R, 



MSE* = E < 



i N 



AT ^ ' ^ 

n=l 



< ^(iV)||*| 



(2.5) 



n$(s t )\Rt} 

where ||$|| = sup sgS l^( s )l- assumption a/so holds, then 



lim CUJV) = 0, 

and £/ie /i/ier converges in mean squared error. Furthermore, if the bandwidth sequence is 
of the form h t (N) = fct^AT 1 /^* 4 ), then decays at mte N -i/(n R +4) md the root 

mean squared error MSE^ 2 at rate N~ 2 K nR+A ^ for all t. 



By (|2,4p . the kernel estimator 

1 N 

f R {rt\Rt-l) = K ht (r t ~ rf\ (2.6) 

n=l 

converges to the conditional density of r% given past observations. Consequently, we can 
estimate the log-likelihood function by Ylt=i ^ n f R{ r t\Rt-i) , and provide a plug-in band- 
width in the online Appendix. We will illustrate in section 4 the finite-sample accuracy of 
the SOS filter. 



9 



3 Recursive Learning Economies 



We consider a class of discrete-time stochastic economies denned at t = 0, . . . , oo on the 
probability space (0, 3", P) and parameterized by 9 G G C W p , p > 1. 

3.1 Information Structure 

In every period t, we define three levels of information, which respectively correspond to 
nature, a Bayesian agent, and the econometrician. Figure [T] illustrates the information 
structure. 

3.1.1 Nature 

A state of nature Mt drives the fundamentals of the economy. We assume that Mt follows 
a first-order Markov chain on the set of mutually distinct states {m 1 (9), . . . ,m d (9)}. For 
every i,j G {1, ..,d}, we denote by aij(9) = F(M t = mP{9)\Mt-\ = m l (9);9) the transition 
probability from state i to state j. We assume that the Markov chain Mt is irreducible, ape- 
riodic, positive recurrent, and therefore ergodic. For notational simplicity, we henceforth 
drop the argument 9 from the states m? and transition probabilities c^j. 

3.1.2 Agent 

At the beginning of every period t, the agent observes a signal vector xt G M. nx , which 
is partially revealing on the state of nature Mt. The probability density function of the 
signal conditional on the state of nature, fx(xt\Mt;9), is known to the agent. Let Xt = 
(x\, . . . , xt) denote the vector of signals received by the agent up to date t. For tractability 
reasons, we make the following hypotheses. 

Assumption 5 (Signal). The signal satisfies the following conditions: 

(a) F(M t = m^Mt-i = m\X t -x\9) = a id for all i,j ; 

(b) fx(xt\M t ,M t - 1 ,...,M ,Xt- 1 ;9) = fx (x t \ M t ; 9) . 

The agent knows the structural parameter 9, is Bayesian and uses Xt to compute the 
conditional probability of the states of nature. 
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Proposition 6 (Agent Belief). The conditional probabilities IB 7 = P(Mt = m J \Xt;9) 
satisfy the recursion: 

• = ^(n f _i,a; f ;g) > 

Eti^(n t -i,x f; ^)' 

w/iere = (IT 1 ^, . . . ,ILli) and w^Et-i, x t ; 9) = f x (x t \M t = m^;0)£tl (H,M-i- 

In applications, the agent values assets or makes financial, production or purchasing de- 
cisions as a function of the belief vector ILj. Our methodology easily extends to learning 
models with non-Bayesian agents, as in Brandt, Zeng, and Zhang (2004) and Cecchetti 
Lam and Mark (2000). 

The state of the learning economy at a given date t is the mixed variable St = (Mj, IT). 
The state space is therefore 

S = {m\...,m d }x A d -\ (3.2) 

where A^T 1 = {IT G M+| X^=i H» = 1} denotes the (d — l)-dimensional unit simplex. 

Proposition 7 (State of the Learning Economy). The state of the learning economy, 
st = (Mi, IT), is first-order Markov. It is ergodic if the transition probabilities between 
states of nature are strictly positive: 04 j > for all and the signal's conditional 
probability density functions fx(x\Mt = mP;9) are strictly positive for all x G M. nx and 

j G {i,...,4- 

The state of the learning economy sj preserves the first-order Markov structure of the state 
of nature Mt. By Bayes'rule (|3.ip . the transition kernel of the Markov state st is sparse 
when the dimension of the signal, rix, is lower than the number of states of nature: nx < d. 
The state st is nonetheless ergodic for all values nx and d under the conditions stated in 
Proposition [71 which guarantees that the economy is asymptotically independent of the 
initial state so- 

3.1.3 Econometrician 

Each period, the econometrician observes a data point G M. nR , which is assumed to be 
a deterministic function of the agent's signal and conditional probabilities over states of 
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nature: 



r t = Oi(xt,U t ,Ut- 1 ;d). 



(3.3) 



We include Hf_i in this definition to accommodate the possibility that rt is a growth rate 
or return. The parameter vector 9 £ MP specifies the states of nature m 1 , . . . ,m d , their 
transition probabilities (,(Hj)i<i,j<di the signal's conditional density fx(-\M t ,9), and the 
data function 3l(xt,Ht,^-t-i', &)• In some applications, it may be useful to add measurement 
error in ()3.3|) : the estimation procedure of the next section applies equally well to this 
extension. 

3.2 Estimation 

We assume that the data Rt = (ri, . . . , tt) is generated by the incomplete-information 
(II) economy with parameter 9* described above. Estimation faces several challenges. The 
transition kernel of the Markov state St and the log-likelihood function Ljj(9\Rt) are not 
available analytically. Furthermore, the observation density fii(rt\st, Rt-i) is not available 
in closed form either because the signal z t , drives the data point r t = R(xt, II t , II t _i ; 9) both 
directly and indirectly through the belief lit = ^t( x ti IIt_i), creating a highly nonlinear 
relationship between the state and the observation. 

The learning model can, however, be conveniently simulated. Given a state st-i = 
(M t -i,Ht-i), we can: (i) sample M t from M t -\ using the transition probabilities a^-; (ii) 
sample the signal xt from fx(-\M t ;9); (Hi) apply Bayes'rule (|3.ip to impute the agent's 
belief IT; and (iv) compute the simulated data point f t = 3?(xt, Ht, n 4 _i; 9). Estimation 
can therefore proceed by simulation-based methods. Simulated ML based on the SOS filter 
is a possible approach. As we will see in section 4, however, an accurate approximation 
of the log-likelihood value L n {9\R T ) = InfiV- 1 £^=i K ht (r t - f t (n) )] may require a 
large number of particles. For situations where simulated ML is too computational^], we 
now propose an alternative approach based on indirect inference. 

For each learning model 9 € 0, we can define an auxiliary full information (FI) model 
in which the agent observes both the state of nature Mt and the signal xt- Her condi- 
tional probabilities are then IT^ = P(Aff = m J \Xt, Mt;9) for all j. The belief vector 
reduces to II t = 1^, where 1m, denotes the vector whose j th component is equal to 1 

5 For instance in the empirical example considered in section 4, we use an SOS filter of size iV = f 7 and a 
dataset of about 20,000 observations. One evaluation of the likelihood function requires the evaluation 200 
billion kernels Kh t {-)- Since a typical optimization requires about 500 function evaluations, the simulated 
ML estimation of the II model would require the evaluation of 100 trillion kernels. 
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if Mt = m? and otherwise, and by f|3.3|) the full information data point is defined by 
n = $-(xt, "i-Mti lAft-ii Q)- The FI model can have less parameters than the II model be- 
cause of the simplification in ITf. We therefore consider that the auxiliary FI model is 
parameterized by where 1 < q < p. 

Assumption 8 (Auxiliary Full-Information Economies). The probability density 
functions fi t j(rt',4>) = fR,Fi( r t\Mt = m J ,Mt_i = m l , <f>) are available analytically for all 
i,j G {1,.. .,d}. 



Proposition 9 (Full-Information Likelihood). Under assumption^ the log-likelihood 
function £jfi{4>\Rt) is available analytically. 



The ML estimator of the full-information economy 

4>T = ai g max. Lfj(4>\Rt) £ M. q 

can therefore be conveniently computed. 

The indirect-inference estimation of the structural learning model proceeds in two steps. 
First, we define an auxiliary estimator that includes the full-information MLE. If q < p, 
we also consider a set of p — q statistics fjx that quantify features of the dataset Rt that 
the learning model is designed to capture. The auxiliary estimator is defined by 



fjr 



(3.4) 



By construction, fix contains as many parameters as the structural parameter 6® 

Second, for any admissible parameter 9, we can simulate a sample path Rst(9) of 
length ST, S > 1, and compute the corresponding pseudo-auxiliary estimator: 



Vst(0) 



<t>ST{9) 

ff S T{0) 



(3.5) 



We focus on the exactly identified case to simplify the exposition and because earlier evidence indi- 
cates that parsimonious auxiliary models tend to provide more accurate inference in finite samples (e.g. 
Andersen, Chung, and Sorensen, 1999; Czellar and Ronchetti, 2010). Our approach naturally extends to 
the overidentified case, which may be useful in cases where it is economically important to match a larger 
set of statistics. 
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where 4>st{9) 
by: 



argmax^ Lpj [(f>\ Rst(&)]- We define the indirect inference estimator 8t 



6t = argmin 



[Ast(#) - At]'^[Ast(0) - At] 



(3.6) 



where O is a positive definite weighting matrix. When the calculation of the full-information 
MLE is expensive, the numerical implementation can be accelerated by the efficient method 
of moments, as is discussed in the appendix. 

Our methodology builds on the fact that the full-information economy can be efficiently 
estimated by ML and is therefore a natural candidate auxiliary model. Moreover, the 
theoretical investigation of a learning model often begins with the characterization of the FI 
case, so the estimation method we are proposing follows the natural progression commonly 
used in the literature. 

We assume that the assumptions [T0HT21 given in the appendix hold. Gourieroux et 
al. (1993) and Gourieroux and Monfort (1996) show that under these conditions and 
assuming the structural model 6*, the auxiliary estimator \xt converges in probability 
to a deterministic function //(#*), called the binding function, and \/T [fix — /-*($*)] ~^ 
N(0, W*), where W* is defined in the appendix. Furthermore, when S is fixed and T goes 
to infinity, the estimator 9t is consistent and asymptotically normal: 



The appendix further discusses the numerical implementation of this method. 

In this section, we have assumed that the state of nature takes finitely many values. 
When Mt has an infinite support, we can discretize its distribution and use the corre- 
sponding full-information discretized economy as an auxiliary model. The definition and 
properties of the indirect inference estimator are otherwise identical. 



VT{9 T -6*) A N(0,S) 



where 




(3.7) 
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4 Inference in an Asset Pricing Model with Investor Learn- 
ing 

We now apply our methodology to a consumption-based asset pricing model. We adopt 
the Lucas tree economy with regime-switching fundamentals of CF (2007), which we use 
to specify the dynamics of daily equity returns. 

4.1 Specification 

4.1.1 Dynamics of the State of Nature 

The rich dynamics of daily returns requires a large state space. For this reason, we consider 
that the state is a vector containing k components: 

M t = (M lyt ,...,M 1 : tt y €R* + , 

which follows a binomial Markov Switching Multifractal (CF 2001, 2004, 2008). The 
components are mutually independent across k. Let M denote a Bernoulli distribution 
that takes either a high value tuq or a low value 2 — mo with equal probability. Given a 
value Mfc )t for the k th component at date t, the next-period multiplier M^t+i is either: 

drawn from the distribution M with probability j^, 

< 

equal to its current value t with probability 1 — 7^. 

Since each component of the state vector can take two possible values, the state space 
contains d = 2 k elements m 1 , . . . , m d . The transition probabilities 7^ are parameterized by 

7 fc = 1 - (1 - Tjr) 6 , k = l,...,k, 

where b > 1. Thus, 7^ controls the persistence of the highest- frequency component and b 
determines the spacing between frequencies. 

4.1.2 Bayesian Agent 

The agent receives an exogenous consumption stream {Ct} and prices the stock, which is 
a claim on an exogenous dividend stream {D t }. Every period, the agent observes a signal 
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x t € R k+2 consisting of dividend growth: 



x u = ln(A/A-i) = 9D~ 



+ a D (M t )e D:t , 



(4.1) 



2 



consumption growth: 



x 2 ,t = ln(C t /C t _i) = g c + <T C £c,t, 



(4.2) 



and a noisy version of the state: 



x i+2 ,t = M i>t + a s z i)t 



i = 1, . . . ,k . 



(4.3) 



The noise parameter as G M+ controls information quality. The stochastic volatility of 
dividends is given by: 



where an £ R+. The innovations ec,t) and ^ are jointly normal and have zero means 
and unit variances. We assume that £c,t and £D,t have correlation pc,D, and that all the 
other correlation coefficients are zero. 

Learning about the volatility state Mt is an asymmetric process. For expositional 
simplicity, assume that the noise parameter as is large, so that investors learn about 
M t primarily through the dividend growth. Because large realizations of dividend are 
implausible in a low-volatility regime, learning about a volatility increase tends to be 
abrupt. Conversely, when volatility switches from a high to a low state, the agent learns 
only gradually that volatility has gone down because realizations of dividend growth near 
the mean are likely outcomes under any M t . 

The agent has isoelastic expected utility, Uq = Eo Ylt^o 5 t Cl~ a /{l — a), where S is 
the discount rate and a is the coefficient of relative risk aversion. In equilibrium, the log 
interest rate is constant. The stock's price-dividend ratio is negatively related to volatility 
and linear in the belief vector: 




(4.4) 



d 




(4.5) 
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Figure 2: Learning Model Simulation. This figure illustrates a sample path simulated 
from the multifrequency learning model. Each panel corresponds to a different level of 
information. Nature's price-dividend ratio Q(Mt) is plotted in the top panel, the agent's 
price-dividend ratio Q(ILJ in the middle panel, and the return rt (computed by the agent 
and observed by the econometrician) in the bottom panel. 



where the linear coefficients (5(m- ? ) are available analytically. 



1 
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4.1.3 Econometric Specification of Stock Returns 

The econometrician observes the log excess return process: 

"l + QCnO" 



n 



In 



t-ij 



+ %l,t ~ Tf . 



(4.6) 



Since learning about volatility is asymmetric, the stock price falls abruptly following a 
volatility increase (bad news), but will increase only gradually after a volatility decrease 
(good news). The noise parameter as therefore controls the skewness of stock returns. 



4.2 Accuracy of the SOS filter 

We now present the results of Monte Carlo simulations of the SOS particle filter. To simplify 
the exposition, we consider one-dimensional aggregates of Mt and lit, which summarize 
economically meaningful information. Specifically, if the agent knew the true state of 
nature, she would set the price-dividend ratio equal to Q(M t ) = Q(m J ) if M t = m J , as 
implied by (|4.5P ; we therefore call Q(Mt) nature's P/D ratio. By contrast, the market 
Q(Ht) aggregates the agent's beliefs in the incomplete-information model; for this reason, 
we refer to it as the agent's price- dividend ratio. 

We generate a sample of size T = 20, 000 periods from the learning model (|4.6|) with 
k = 3 volatility components and fixed parameter values§| Figure [2] illustrates the last 1,000 
periods of the simulated sample. We report nature's price-dividend ratio in the top panel, 
the agent's price-dividend ratio in the middle panel, and the return (computed by the agent 
and observed by the econometrician) in the bottom panel. 



7 The price-dividend ratio is given by 

Ct + n \ D t 



Dt 



Xt 



n 



,9D —rf—apcDCCCn (Mt + h) 



Xt 



where r/ = — ln(<5) + age — a 2 a^/2 is the log interest rate. Since volatility is persistent, a high level of 
volatility at date t implies high forecasts of future volatility, and therefore a low period— t price-dividend 
ratio. The linear coefficients are given by (Qim 1 ), Q(m d ))' = (I - B)-\ - i , where B = (6y)i<i,i<<j 
is the matrix with components bij = dij exp [go ~ Tf — a pc,D &c cx^m 3 )] and i = (1, . . . , 1)'. 

8 Specifically, we set mo = 1.7, "fjr = 0.06, b = 2 and as = 1, the consumption drift to gc = 0.75 basis 
point (bp) (or 1.18% per year), log interest rate to r/ = 0.42 bp per day (1% per year), excess dividend 
growth equal to go — Tf = 0.5 bp per day (about 1.2% per year), consumption volatility to ac = 0.189% 
(or 2.93% per year), and dividend volatility <jd = 0.70% per day (about 11% per year). The correlation 
coefficient is set equal to pc,D = 0.6, and a is chosen such that the mean of the linear coefficients in H4,5[l 
satisfy Q = d~ 1 ^- =1 Q{m l ) = 6000 in daily units (25 in yearly units). 
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Figure 3: Accuracy of the SOS Filter. This figure illustrates the estimated log-likelihood 
function (left panel) and the efficiency measures Rqm\ and Rq(m) (right panel) as a func- 
tion of the filter size N. 
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We apply to the entire simulated sample the SOS filter with the quasi-Cauchy kernel 
and bandwidth derived in the online Appendix. The left panel of Figure [3] illustrates the 
estimated log-likelihood as a function of the filter size N. In the right panel, we report the 
pseudo R 2 : 



where Q(II t ) = J2n=i Q{ u t)/ N and <9( n ) = ELl Q( u t)/T. We similarly compute 
Rqr M \ for nature's price-dividend ratios using {Q(M 4 (n) )}. Thefi gure shows that both the 
estimated log-likelihood and the coefficients of determination increase with the filter size 
N and settle down for iV > 10 6 . The coefficient of determination reaches 67.6% for Q{M) 
and 71.5% for Q(J\). Thus, the agent's P/D ratio is better estimated than nature's P/D 
ratio, as the information structure in Figure Q] suggests. 

The true value of the likelihood function is unknown for the example considered in 
Figure [3j For this reason, we now consider the full- information version of the model, which, 
by Proposition [9l has a closed-form likelihood. We generate from the full-information 
model a sample of T = 20, 000 periods. The analytical expression of the log-likelihood 
implies that Lpj = 79,691.5. In the right column of Table [2j we report the sample 
mean and the root mean squared error of fifty log-likelihood estimates computed using 
SOS. The relative estimation error RMSE/£ F/ is 0.024%, 0.006% and 0.002% when using, 
respectively, N = 10 5 , 10 6 and 10 particles. The estimates of the FI log-likelihood obtained 
using SOS are therefore remarkably precise. 

We now verify that the SOS filter defeats the curse of dimensionality with respect to 
the size of the state space. Table [T] reports the topological dimension of the state space, 
dimS, under incomplete and full information. By construction, the log-likelihood function 
satisfies the continuity property: lim^^o £j/(mO) J]:, b, as\Rr) = ^Fi(fno, T^j b\Rx) ■ The 
first three columns in Table [2] report summary statistics of log- likelihood estimates of Ljj 
obtained for a§ G {1, 0.1, 0.01}. The accuracy of SOS is nearly identical for the full- 
information model and for the learning model with <r,5 = 0.01. With N = 10 7 particles, the 
RMSE of the SOS filter is even slightly smaller for the II specification as = 0.01 than for 
the full-information model, even though II has a much larger state space. These findings 
confirm the result of Theorem 0] that the convergence rate of SOS is independent of the 
dimension of the state space. 




'Q(n) - 



£f=i [Q(n f )-Q(n)] 



2 ' 
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Table 1: Dimension of the state spach3 





Incomplete Information 


Full Information 


State space S 


{m\...,m d } x A'jr 1 


{m 1 , . . . , m d } 


Dimension dimS 


d-l 






"This table reports the topological dimension of the state space under full 
and incomplete information. In the multifrequency volatility case, we know that 
d — 2 k , where k denote the number of volatility frequencies. 



Table 2: Precision of the SOS log-likelihood estimates^ 









II (dimS = 


7) 


FI (dimS = 0) 






as = 1 


as = 0.1 


a 8 = 0.01 




Mean, N = 


10 5 


79,514.1 


79,674.0 


79,673.1 


79,673.4 


Mean, N = 


10 6 


79,523.2 


79,686.3 


79,687.4 


79,687.3 


Mean, N = 


10 7 


79,525.1 


79,690.8 


79,690.9 


79,690.4 


RMSE, N = 


= 10 5 


177.6 


18.3 


19.5 


18.9 


RMSE, N = 


= 10 6 


168.4 


6.3 


5.2 


4.9 


RMSE, N = 


= 10 7 


166.4 


1.1 


1.3 


1.7 



"We report summary statistics for 50 simulated log-likelihoods estimated on a fixed 
sample path of T — 20, 000 periods from the FI model. The true log-likelihood is Lfi = 
79,691.5. The simulated log-likelihoods are based on an SOS filter and a learning model 
with as £ {0.01,0.1, 1}. 



21 



4.3 Indirect Inference Estimator 

We now develop an estimator for the vector of structural parameters: 

= (m , 7^, b, as)' € [1, 2] x (0, 1] x [1, oo) x R+, 

where mo controls the variability of dividend volatility, 7^ the transition probability of the 
most transitory volatility component, b the spacing of the transition probabilities, and as 
the precision of the signal received by the representative agent. As is traditional in the 
asset pricing literature, we calibrate all the other parameters on aggregate consumption 
data and constrain the mean price-dividend ratio to a plausible long-run value 

E[Q(n t )] = Q, (4.7) 

where Q is set equal to 25 in yearly units|§ 

The learning economy is specified by p = 4 parameters, 9 = (mo, 7^, b, as)', while the FI 
economy is specified by q = 3 parameters, cp = (mo, 7^, b)'. For this reason, the definition of 
the auxiliary estimator requires an additional statistic fjr G M. Since the noise parameter 
as controls the skewness of excess returns, the third moment seems like a natural choice. 
We are concerned, however, that the third moment may be too noisy to produce an efficient 
estimator of 9. For this reason, we consider an alternative based on the observation that 
by restriction (|4.7|) . the mean return is nearly independent of the structural parameter: 

K(r t )^ln(l + l/Q)+g D -r f -a 2 D , (4.8) 

as is verified in the online appendix. Since the mean is fixed, the median can be used 
as a robust measure of skewness. The auxiliary estimator \xt = (</>t>?7t)' is defined by 
expanding the ML estimator of the full-information economy, with either the third 
moment {t)t = T^ 1 Ylt=i r t) or median (t)t = median{r(}) of returns. 

In Figure HI we illustrate the relation between the median-based auxiliary estimator fix 
and the structural parameter 9 on a long simulated sample of length ST = 10 7 . The graphs 

9 The calibrated parameters are the same as in the previous subsection. An alternative approach would 
be to estimate all the parameters of the learning economy on aggregate excess return data. In the 2005 
NBER version of their paper, CF applied this method to the FI model and obtained broadly similar results 
to the ones reported in the published version. This alternative approach has the disadvantage of not taking 
into account the economic constraints imposed by the model, and we do not pursue it here. 
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Figure 4: Auxiliary Estimator. This figure illustrates the relation between the median- 
based auxiliary estimator and the structural parameter 8. In each column, one structural 
parameter is allowed to vary while the other three parameters are set to their reference 
values. The auxiliary estimate reported for every 9 is obtained from a simulated sample of 
length 10 7 generated under the learning model 8 with k = 3 volatility components. 
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can be viewed as cuts of the binding function n(9). The top three rows show that for all 
i € {1,2,3}, the auxiliary parameter jiT,i increases monotonically with the corresponding 
parameters 6% of the learning economy, and is much less sensitive to the other parameters 9j , 
j 7^ i (including ag). Moreover, we note that the auxiliary estimator of b, based on FI ML, 
is a biased estimator of the parameter b of the incomplete-information economy; this finding 
illustrates the pitfalls of employing quasi-maximum likelihood estimation in this setting. 
The bottom row shows that as the noise parameter as increases, the median return increases 
monotonically, consistent with the fact that returns become more negatively skewed. In the 
online appendix, we verify that the third moment is decreasing monotonically with ag. The 
structural parameter 9 is thus well identified by our two candidate auxiliary estimators. 

As a benchmark, we also construct a simulated method of moments (SMM) estimator. 
In the online appendix, we illustrate the impact of the structural parameter 9 on the 
expected values of r", n S {1,...,4}, the leverage coefficient r t _irf, and the volatility 
autocorrelation measure r|_ 1 r^ . The leverage measure and the second, third and fourth 
moments appear to be the most sensitive to the structural parameter 9, and are therefore 
selected for the definition of the SMM estimator. 

In Figure [5j we report boxplots of SMM, third moment-based and median-based II 
estimates of 9 obtained from 100 simulated sample paths of length T = 20, 000 from the 
learning model with k = 3 volatility components. For all three estimators, we set the 
simulation size to S = 500, so that each simulated path contains ST = 10 7 simulated data 
points. The indirect inference procedures provide more accurate and less biased estimates of 
the structural parameters of the learning economy than SMM. The median-based estimator 
provides substantially more accurate estimates of the parameter as that controls the agent's 
information quality. The median-based estimator thus strongly dominates the other two 
candidate estimators, and we now use it empirically. Overall, the Monte Carlo simulations 
confirm the excellent properties of the filtering and estimation techniques proposed in the 
paper. 

4.4 Empirical Estimates and Value at Risk Forecasts 

We apply our estimation methodology to the daily log excess returns on the U.S. CRSP 
value-weighted equity index from 2 January 1926 to 31 December 2009. The dataset 
contains 22,276 observations, which are illustrated in Figure [6l We partition the dataset 
into an in-sample period, which runs until 31 Dec 1999, and an out-of-sample period, which 
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Figure 5: Monte Carlo Simulations of the Learning Model Estimators. This figure illus- 
trates boxplots of the structural parameter estimates obtained using SMM (left boxplot of 
each panel), the indirect inference estimator based on the third moment (middle boxplots), 
and the median-based indirect inference estimator (right boxplots). The horizontal lines 
correspond to the true value of each parameter. 
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Figure 6: U.S. Equity Return Data. This figure illustrates the daily log excess returns 
on the CRSP U.S. value-weighted equity index between 2 January 1926 and 31 December 
2009. The dashed line separates the in-sample and out-of-sample periods. 

covers the remaining ten years. 

In Table O we report the II estimates of 8. We let ST = 10 7 and report standard errors 
in parentheses. The estimate of as is significant and declines with fc0 This finding is 
consistent with the intuition that as k increases, the effect of learning becomes increasingly 
powerful, and a lower as better matches the negatively skewed excess return series. We 
also report the log-likelihood of each specification, which is estimated by an SOS filter 
with N = 10 7 particles every period. The likelihood function of the II model increases 
steadily with k. We report in parentheses the t— ratios of a HAC-adjusted Vuong (1989) 

10 When k = 1, the auxiliary parameter is nearly invariant to as in the relevant region of the parameter 
space. The Jacobian of the binding function is almost singular, and by (|3.7[1 . the estimator of as has a very 
large asymptotic variance. The specification with k — 1 cannot match the median of historical returns and 
is therefore severely misspecified. These findings illustrate the empirical importance of using higher values 
of k. 
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Table 3: EMPIRICAL ESTIMATE^ 



k 




Parameter Estimates 


Estimated 




m 


Ik 


b 




Likelihood 
(in logs) 


1 


1.732 

(0.0091) 


0.063 

(0.0033) 




93.807 

(61,616.8) 


65,680.1 

(-10.4948) 


2 


1.714 

(0.0061) 


0.054 

(0.0036) 


21.104 

(10.5573) 


4.001 

(1.1036) 


67, 104.9 

(-8.0477) 


3 


1.690 

(0.0055) 


0.071 

(0.0055) 


16.471 

(9.9115) 


2.401 

(1.5599) 


67, 534.7 

(-8.6697) 


4 


1.587 

(0.0059) 


0.047 

(0.0049) 


5.089 

(0.5387) 


1.411 

(0.1714) 


68,167.8 



a We report empirical estimates of the learning model (with 
standard errors in parentheses) based on the daily excess returns 
of the CRSP index between 2 January 1926 and 31 December 
1999. The log-likelihood estimates are based on an SOS filter 
containing TV = 10 7 particles. HAC-adjusted Vuong tests com- 
paring k < 3 specifications to k — 4 are reported in parentheses 
below the log-likelihood estimates. 

test, that is the rescaled differences between the log-likelihoods of the lower-dimensional 
(k £ {1,2,3}) and the highest-dimensional (k = 4) specifications. The four-component 
model has a significantly higher likelihood than the other specifications and is therefore 
selected for the out-of-sample analysis. 

We now turn to the out-of-sample implications of the incomplete-information model. 
The value at risk VaR^ +1 constructed on day t is such that the return on day t + 1 will 
be lower than —VaR^ +1 with probability p. The failure rate is specified as the fraction 
of observations where the actual return exceeds the value at risk. In a well specified VaR 
model, the failure rate is on average equal to p. We use as a benchmark historical simulations 
(e.g. Christoffersen 2009) and Student GARCH(1,1), which are widely used in practice. 
The historical VaR estimates are based on a window of 60 days, which corresponds to 
a calendar period of about three months. In Table [U we report the failure rates of the 
VaR\ +l forecasts for p = 1%, 5%, 10%, at horizons of 1 and 5 days produced by: historical 
simulations, GARCH, the full-information model and the learning model with k = 4. 
Standard deviations are reported in parentheses. A failure rate is in bold characters if it 
differs from its theoretical value at the 1% significance level. 
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Table 4: Failure rates of value-at-risk forecast^ 



Models 


One Day 
1% 5% ' 10% 


Five Days 
1% 5% " 10% 


Historical VaR 


0.069 0.119 

(0.0051) (0.0065) 


0.066 0.129 

(0.0111) (0.0150) 


GARCH 


0.081 0.154 0.197 

(0.0054) (0.0072) (0.0079) 


0.048 0.123 0.165 

(0.0095) (0.0147) (0.0166) 


FI,k = 4 


0.016 0.070 0.132 

(0.0025) (0.0051) (0.0067) 


0.012 0.068 0.143 

(0.0048) (0.0112) (0.0156) 


II,k = 4 


0.008 0.047 0.094 

(0.0018) (0.0042) (0.0058) 


0.014 0.060 0.135 

(0.0052) (0.0106) (0.0153) 



"This table reports the failure rates of the 1-day and 5-day value at risk forecasts 
produced by various methods in the out-of-sample period (2000-2009). The historical 
VaR is based on a rolling window of 60 days. The GARCH, FI and II forecasts are 
computed using in-sample parameter estimates. II forecasts are based on an SOS 
filter with N = 10 r elements. The significance level is 1%. 



Historical simulations provide inaccurate VaR forecasts at the 1-day horizon. The fail- 
ure rates are significantly higher than their theoretical values, which suggests that historical 
simulations provide overly optimistic estimates of value at risk. GARCH VaR estimates 
are significantly higher in all cases, while the FI model's VaR predictions are rejected in 
three out of six cases. On the other hand, the VaR predictions from the learning model 
are all consistent with the data. Our empirical findings suggest that the learning model 
captures well the dynamics of daily stock returns, and outperforms out of sample some 
of the best reduced-form specifications. We note that this is an excellent result for a 
consumption-based asset pricing model. 

5 Conclusion 

In this paper, we have developed powerful filtering and estimation methods for a wide 
class of learning environments. The new SOS algorithm applies to general state space 
models in which state-observation pairs can be conveniently simulated. Our method makes 
no assumption on the availability of the observation density and therefore expands the 
scope of sequential Monte Carlo methods. The rate of convergence does not depend on 
the size of the state space, which shows that our filter defeats a form of the curse of 
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dimensionality. Among many possible applications, SOS is useful to estimate the likelihood 
function, conduct likelihood-based specification tests, and generate forecasts. 

The new filter naturally applies to nonlinear economies with agent learning of the type 
often considered in financial economics. In this context, SOS permits to track in real 
time both fundamentals and agent beliefs about fundamentals. Estimation can proceed 
by simulated ML, but this approach can be computationally costly, as in the example of 
section 4. For this reason, we have defined an indirect inference estimator by expanding 
the full-information MLE with a set of statistics that agent learning is designed to capture. 

These methods have been applied to a consumption-based asset pricing model with 
investor learning about multifrequency volatility. We have verified by Monte Carlo simula- 
tions the accuracy of our SOS filter and indirect inference estimators, and have implemented 
these techniques on a long series of daily excess stock returns. We have estimated the pa- 
rameters driving fundamentals and the quality of the signals received by investors, tracked 
fundamentals and investor beliefs over time, and verified that the inferred specification 
provides good value-at-risk forecasts out of sample. 

The paper opens multiple directions for future research. SOS can be used to price 
complex instruments, such as derivatives contracts, which crucially depend on the distri- 
bution of the hidden state. We can expand the role of learning in the analysis, for instance 
by letting the agent learn the parameter of the economy over time, or by conducting the 
joint online estimation of the structural parameter 6 and the state of the economy st, as 
in Poison, Stroud, and Mueller (2008) and Storvik (2002). Further extensions could in- 
clude inference for equilibrium models with asymmetric information (e.g. Biais, Bossaerts, 
and Spatt, 2010), and the development of value-at-risk models that incorporate the cross- 
sectional dispersion of investor beliefs. 
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A Convergence of the SOS Filter (Section |2j) 
A.l A Preliminary Result 

In this appendix, we show the convergence of the SOS particle filter defined in section 2 
as the number of particles iV goes to infinity. Since the path Rt is fixed, our focus is on 
simulation noise, and expectations in this section are over all the realizations of the random 
particle method. We begin by establishing the following result for a given N >1 and t > 1. 

Lemma Al. Assume that there exists Ut-i(N) such that for every bounded measurable 
function $ : § — > R, 



E 



1 N 
N ^ 



» ^ 
't-l, 



E[$(s t -i)\Rt-i] 



n=l 



< U t -i(N)\\$\ 



(A.l) 



Let U?(N) = 2K' t 2 A(K) 2 hf + B(K)nt/ (Nh™ R ) + 2U t -i(N)K 2 . Then, the inequality 



E 



1 N 

- £ Hs^KhM - 4 n) ) - fR(n\Rt-i)® [*(*)| Rt] 



71=1 



<U t *(N)\\<f>\ 



holds for every bounded measurable function $. 
Proof of Lemma Al. We consider the function 



<H-l(s t -i)= / ${st)K ht {r t - r t )g{ds u dr t \st-i,Rt-i)- 



We note that 



\at-i(s t -i)\ < ||*|| / K ht (r t - f t )g(ds t ,df t \st-i,Rt-i) 
= 11*11/^(^-^(^,^-1)^. 



The function at-% is therefore bounded above by nt ||*| 



The difference Z = N' 1 £^ =1 <f>(s\ n) )K ht (r t 



f R (r t \Rt-i)M[$(s t )\Rt} is the 
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sum of the following three terms: 

N 



n=l 

1 * f 

Z 2 = ■jr'Y]<H-i(st-i) ~ / a t -i(st-i)X(ds t -i\Rt-i), 

n=l ^ 

%-/^ I (H-.)A(^.l^ I )-/i.N*- 1 )«[.(*)l*l. 

Let = (s^, . . . , Sp[\) denote the vector of period— (t — 1) particles. Conditional on 
§1^1, has a zero mean, while Z2 and Z3 are deterministic. Hence: 

E(Z 2 ) = E(Zf) + E[(Z 2 + Z 3 ) 2 ] < E(Z 2 ) + 2E(Z|) + 2E(Z|). 

Conditional on S^, the state-observation pairs {(s^ )}^L;i are independent, and each 

(s[ n \r[ n ^) is drawn from g(-\a[^ v Rt-i); the addends of §(sf^)Kh t (rt - 
are thus independent and have mean zero. We infer that the conditional expectation of Z 2 
is bounded above by: 



/ Hst) 2 K h M - rt) 2 9(dh,dn\s[%Rt-i) < [ K ht {r t -h?dh 

71=1 J 



We apply the change of variable u = (r* — ft) / ht\ 

B(K) 



J K ht (r t - f t ) 2 df t 



and infer that E(Z 2 ) < ||$|| 2 B(K)K t / '(Nti£ R ). 

Since the function at-i(st-i) is bounded above by Kt \\&\\, we infer from (jA.ip that: 

< u t -!(N)^ \\n 2 - 

Finally, we observe that / H (r t |J2t_i)E [*(a t )| lit] = / ®{s t )f R (r t \s t , Rt-i)X(ds t \Rt-i), 
and therefore 

z 3 = /.(,) - A( rfSl |H«-0 

- /.(«) {/irWWn - - M*,*,-,)]*.} 
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Note that \jK(u)[f R (r t - h t u\s t , R t -i) - f R (r t \s t , R t -i)]du\ < K' t A(K)h 2 . Hence \Z 3 \ < 
K' t A(K)h 2 \\&\\ and therefore E(Z|) < K' t 2 A(K) 2 h%\\®\\ 2 . We conclude that the lemma holds. 
Q.E.D. 



A. 2 Proof of Theorem [4] 



The proof of (|2.5[) proceeds by induction. When t = 0, the particles are drawn from the 
prior Ao, and the conditional expectation is computed under the same prior. Hence the 
property fl23]) holds with U (N) = l/N. 

We now assume that the property (|2.5p holds at date t — 1. The estimation error 
X = N- 1 J2n=i $(4 n) ) - E[<S>(s t )\R t ] is the sum of: 



N N 

^ = ^E $ (4 n) )-E^ n)$ ( s t } )- 



Xo 



n=l 

N 



n=l 



,n=l 



fR(n\R, 



t-ij 



N 



The first term, Xi , corresponds to step 3 resampling, the second term to the normalization 
of the resampling weights, and the third term to the error in the estimation of $ using the 
nonnormalized weights. 

Conditional on {(s| n ' ,f|^)}|JLii the particles s[ n ^ are independent and identically dis- 
tributed, and X\ has mean zero. We infer that E[X 2 \{s[ n \ r^}^ =1 ] < ||$|| 2 /iV, and there- 
fore E(X 2 ) < \\<S>\\ 2 /N. Note that when we use stratified, residual or combined stratified- 
residual resampling in step 3, the inequality E(X 2 ) < ||<I>|| 2 /iV remains valid, and smaller 
upper bounds can also be derived F^l 

Conditional on {(s+ , ^)}^Li, X2 and X3 are deterministic variables. The mean 
squared error satisfies: 

E(X 2 ) = E(Xf) + E[(X 2 + X 3 ) 2 ] < E(Xf) + 2E(X|) + 2E(X|). 

We note that \X 2 \ < WmfnfalRt-i)]- 1 f R (r t \Rt-i) - Zn'=i K h t (n ~ r^/N . Us- 



See Cappe, O., Moulines, E., and T. Ryden (2005, ch. 7) for a detailed discussion of sampling variance. 
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ing the induction hypothesis at date t — 1, we apply Lemma Al with = 1 and obtain 
that E(X|) is bounded above by: 



^(AQ||0|| 2 
[fn(rt\Rt-i)Y 



(A.2) 



Lemma Al implies that E(X|) is also bounded above by (|A.2p . We conclude that E(X 2 ) < 
U t (N)\\<S>\\ 2 , where U t (N) = 4[/ t *(A0[/i?(r t | J R t _i)]~ 2 + AT 1 , or equivalent^ 



C/t(AT) 



[/hN^-i)] 2 



+ 



(A.3) 



This establishes part (|2.5p of the theorem. From (|2.5p and Lemma Al with = 1, (|2.4p 
follows. 

Assume now that the bandwidth is a function of iV, and that assumption [3] holds. 
A simple recursion implies that hin^^oo Ut(N) = for all t. The mean squared error 
converges to zero for any bounded measurable function 0. 

We now characterize the rate of convergence. Given Ut-i(N), we know that the coeffi- 
cient Ut(N) defined by (|A.3P is minimal if 



ht = AT-V(n«+4) 



K t n R B(K) 
8k[ 2 A(K) 2 



l/(nfl+4) 



(A.4) 



More generally, if the bandwidth sequence is of the form ht(N) = h t (l)/N 1 /( n «+ 4 ) ; then 
Ut(N) is of the form: 



U t (N) = u ht N~ 4 ^ n « + V + U2,tU t ^(N) + N- 1 . 



(A.5) 



where u\ t and U2,t are finite nonnegative coefficientsF 2 ! By a simple recursion, Ut(N) is of 
order jV- 4 /(«i?+4) for all t. Q.E.D. 



12 We verify that u± t 
8K 2 t [f(r t \R t -i)}- 2 . 



4[/(r t | J R t _ 1 )]~ 2 [2K?h t (l) 4 A(K) 2 + B(K)Ktht(l)- na ] and u 2 , t 
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B Learning Economies (Section [3]) 



B.l Proof of Proposition [5] 

We infer from Bayes' rule that 



Ul oc fx(xt\M t = m j ,Xt- 1 ;e)F(M t = m>\X t -i',i 



=f x {xt\Mt=mi-fi) by As. Ob) 

where 



P(M t = m J '|X t _i;0) = VP(M ( = m J '|M t _i = m\X t _i;0)P(M t _i = m^Xt-i^), 

» ' 

=aij by As. [5{ a ) 

and Proposition [6] holds. Q.E.D. 

B.2 Proof of Proposition [7] 

Bayes' rule (|3.ip implies that for every i € {1, . . . , d}, 

U t \M t = m i ,s t - 1 ,...,s 1 ~ n t |M t = m*,n t _i. (B.l) 

Also, by Assumption [5ja) 

P(M t = m> t _i, . . . , s i; 0) = P(M t = ro*|M t _i; 0) . (B.2) 

From (jB.lj) and (IB, 2ft . we conclude that St is first-order Markov. 

We know from Kaijser (1975) that under the conditions stated in the proposition, 
the belief process lit has a unique invariant distribution. Proposition 2.1 in van Handel 
(2009) implies that (Mt,Tlt) also has a unique invariant measure A^Pn We infer from the 
Birkhoff-Khinchin theorem that for any integrable function $ : S — >■ R, the sample average 
T _1 Ylt=i ^( s t) converges almost surely to the expectation of $ under the invariant measure 
Aoo. Q.E.D. 



3 Chigansky (2006) derives a similar result in continuous time. 
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B.3 Proof of Proposition [9] 

The econometrician recursively applies Bayes'rule: 

f R ,Fi(r t \M t = m?,R t -i; <f>)F(M t = mP\R t -i; 



fR,Fi(r t \Rt-i;< 



Since f R)F i{n\M t = m? , Rt-V, 4) = Ztl fiA r f, <f>MM t -i = m l \M t = m> ,R t -x\<t>), we 
infer that f R , F i{r t \M t = m? , R t -i; <f>)F(M t = mP\Rt-i;<l>) = E?=i fain; $P(M*_i = 
m l ,M t = m J \Rt-i; 4>), and therefore 

x yJ i <k ifi An; <t>)HM t -i = mHRt-r, <t>) 



fR,Fi(r t \Rt-i;<P) 

The econometrician 's conditional probabilities are therefore computed recursively. 

Since the conditional probabilities P(Mj = m- 7 (f>) add up to unity, the conditional 
density of r t satisfies 



d d 

f R ,Fi(r t \Rt-i;(t>) =J2J2 a idfiA^(t>MM t ^ 1 =m i \Rt- i : 

i=l j=l 



The log-likelihood function Lpj^Rx) = Et=i m fR,Fl( r t\Rt-i] 4>) thus has an analytical 
expression. Q.E.D. 

B.4 Indirect Inference Estimator 

We provide a set of sufficient conditions for the asymptotic results at the end of section 
3.2, and then discuss numerical implementation. 

B.4.1 Sufficient Conditions for Convergence 

We assume that ffr maximizes a criterion ^K(rj,Rr) that does not depend on the full- 
information MLE (fyp- The auxiliary estimator p,x = ((t>' T ,fj' T y can therefore be written 
as: 

fiT = arg max Qt(/j-,Rt), (B-3) 
where Q T (fj,,R T ) = T^Lpi^, Rt) + Itffa Rt) for all p = ((/>', rj')'. 
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Assumption 10 (Binding Function) Under the structural model 9*, the auxiliary cri- 
terion function Qt(^,Rt) converges in probability to Q oo (fi,0*) for all (jl. Moreover, the 
function fi : W — > MP defined by 

fi(6) = arg max Qoo(^, 0), 
n 

called the binding function, is injective. 

Assumption 11 (Score) The renormalized score satisfies: 

where Iq is positive definite symmetric matrix. 

Assumption 12 (Hessian of Criterion Function) The Hessian matrix 

d 2 Q,T 

is invertible and converges in probability to a nonsingular matrix Jq. 

Under assumptions [T0]fT2l the auxiliary estimator satisfies y/T[p,T — M^*)] ~^ N(0, W*), 
where W* = J^IqJq 1 , and the asymptotic results at the end of section [3T2l hold (Gourieroux, 
Monfort, and Renault, 1993; Gourieroux and Monfort, 1996). 

B.4.2 Numerical Implementation 



Since in the just-identified case [ist(9t) = fir, the simulated auxiliary estimator £ist{9) 
satisfies 

Fin err 

-[flST(0),RsT(0)]=O. 



dQsr, 



d/j, 

At 

Hence, the indirect inference estimator 9t minimizes the EMM-type objective function: 

{ [At ' RsTm } Wt { [At ' RsTm } ' (R4) 
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where Wt is any positive-definite weighting matrix. This property can be used to compute 
6t- For each iteration of 9, the evaluation of the EMM objective function (|B.4p requires 
only the evaluation of the score. By contrast, the evaluation of the objective function 
(|3.6|) requires the optimization of the FI likelihood in order to obtain {Ist{6)- The com- 
putational advantage of EMM is substantial in applications where the calculation of the 
full-information MLE is expensive. 

In the just-identified case and under assumptions [T01I131 the asymptotic variance- 
covariance matrix of the indirect inference estimator simplifies to 

i2f 



{ 1 + l){l^^-^ 9 3^^- e ' 



as shown in Gourieroux and Monfort (1996). Note that the choice of the weighting matrix 
Wt does not affect the asymptotic variance of the indirect inference estimator in the exactly 
identified case. 

In practice, we can estimate Iq and gggfg [fJ,(6*), 0*] in the following way. 

Assumption 13 (Decomposable Score) The score function can be written as: 

T 



-Q—(fj,,R T ) = i){n\R t -i\ n) 



t=i 

for all Rt and \x. 

Note that Assumption [13] is satisfied by the median-based and the third moment-based 
indirect inference estimators considered in section 4. 

By Assumption 1131 the auxiliary parameter satisfies the first-order condition: 

<9Qt 1 

—L^Rt) = -J2Hrt\Rt-i;M = 0. (B.5) 
We estimate Iq by the Newey and West (1987) variance-covariance matrix: 

i = f o + E f 1 - t^t) ( f - + f v) > ( B - 6 ) 

v=l 

where f „ = T _1 J2t= v +i ^{ r t\Rt-i\ AtOV'C 7 ** At)'- All the results reported in the paper 
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are based on r = 10 lags. We approximate §g§g? [^(6*), 0*] by 

and obtain a finite-sample estimate of the asymptotic variance-covariance matrix S. 
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